modular architecture
Is a Modular Architecture Enough?
Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures. Recent work demonstrates that not only do some modular architectures generalize well, but they also lead to better out of distribution generalization, scaling properties, learning speed, and interpretability. A key intuition behind the success of such systems is that the data generating system for most real-world settings is considered to consist of sparse modular connections, and endowing models with similar inductive biases will be helpful. However, the field has been lacking in a rigorous quantitative assessment of such systems because these real-world data distributions are complex and unknown. In this work, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. In doing so, we propose evaluation metrics that highlight the benefits of modularity, the regimes in which these benefits are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential.
Discrete-Valued Neural Communication
Deep learning has advanced from fully connected architectures to structured models organized into components, e.g., the transformer composed of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes. The nature of structured models is that communication among the components has a bottleneck, typically achieved by restricted connectivity and attention. In this work, we further tighten the bottleneck via discreteness of the representations transmitted between components. We hypothesize that this constraint serves as a useful form of inductive bias. Our hypothesis is motivated by past empirical work showing the benefits of discretization in non-structured architectures as well as our own theoretical results showing that discretization increases noise robustness and reduces the underlying dimensionality of the model.
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology (0.46)
- Education (0.46)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
KnowThyself: An Agentic Assistant for LLM Interpretability
Prasai, Suraj, Du, Mengnan, Zhang, Ying, Yang, Fan
We develop KnowThyself, an agentic assistant that advances large language model (LLM) interpretability. Existing tools provide useful insights but remain fragmented and code-intensive. KnowThyself consolidates these capabilities into a chat-based interface, where users can upload models, pose natural language questions, and obtain interactive visualizations with guided explanations. At its core, an orchestrator LLM first reformulates user queries, an agent router further directs them to specialized modules, and the outputs are finally contextualized into coherent explanations. This design lowers technical barriers and provides an extensible platform for LLM inspection. By embedding the whole process into a conversational workflow, KnowThyself offers a robust foundation for accessible LLM interpretability.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New Jersey (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (2 more...)
Platform-Agnostic Modular Architecture for Quantum Benchmarking
Patel, Neer, Giri, Anish, Patil, Hrushikesh Pramod, Siekierski, Noah, Chatterjee, Avimita, Johri, Sonika, Proctor, Timothy, Lubinski, Thomas, Niu, Siyuan
We present a platform-agnostic modular architecture that addresses the increasingly fragmented landscape of quantum computing benchmarking by decoupling problem generation, circuit execution, and results analysis into independent, interoperable components. Supporting over 20 benchmark variants ranging from simple algorithmic tests like Bernstein-Vazirani to complex Hamiltonian simulation with observable calculations, the system integrates with multiple circuit generation APIs (Qiskit, CUDA-Q, Cirq) and enables diverse workflows. We validate the architecture through successful integration with Sandia's $\textit{pyGSTi}$ for advanced circuit analysis and CUDA-Q for multi-GPU HPC simulations. Extensibility of the system is demonstrated by implementing dynamic circuit variants of existing benchmarks and a new quantum reinforcement learning benchmark, which become readily available across multiple execution and analysis modes. Our primary contribution is identifying and formalizing modular interfaces that enable interoperability between incompatible benchmarking frameworks, demonstrating that standardized interfaces reduce ecosystem fragmentation while preserving optimization flexibility. This architecture has been developed as a key enhancement to the continually evolving QED-C Application-Oriented Performance Benchmarks for Quantum Computing suite.
- North America > United States > Florida > Orange County > Orlando (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- (8 more...)
- Energy (1.00)
- Government > Regional Government > North America Government > United States Government (0.68)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology (0.46)
- Education (0.46)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
The Anatomy of a Personal Health Agent
Heydari, A. Ali, Gu, Ken, Srinivas, Vidya, Yu, Hong, Zhang, Zhihan, Zhang, Yuwei, Paruchuri, Akshay, He, Qian, Palangi, Hamid, Hammerquist, Nova, Metwally, Ahmed A., Winslow, Brent, Kim, Yubin, Ayush, Kumar, Yang, Yuzhe, Narayanswamy, Girish, Xu, Maxwell A., Garrison, Jake, Lee, Amy Armento, Vafeiadou, Jenny, Graef, Ben, Galatzer-Levy, Isaac R., Schenck, Erik, Barakat, Andrew, Perez, Javier, Shreibati, Jacqueline, Hernandez, John, Faranesh, Anthony Z., Prieto, Javier L., Heneghan, Connor, Liu, Yun, Zhan, Jiening, Malhotra, Mark, Patel, Shwetak, Althoff, Tim, Liu, Xin, McDuff, Daniel, Xu, Xuhai "Orson"
Health is a fundamental pillar of human wellness, and the rapid advancements in large language models (LLMs) have driven the development of a new generation of health agents. However, the application of health agents to fulfill the diverse needs of individuals in daily non-clinical settings is underexplored. In this work, we aim to build a comprehensive personal health agent that is able to reason about multimodal data from everyday consumer wellness devices and common personal health records, and provide personalized health recommendations. To understand end-users' needs when interacting with such an assistant, we conducted an in-depth analysis of web search and health forum queries, alongside qualitative insights from users and health experts gathered through a user-centered design process. Based on these findings, we identified three major categories of consumer health needs, each of which is supported by a specialist sub-agent: (1) a data science agent that analyzes personal time-series wearable and health record data, (2) a health domain expert agent that integrates users' health and contextual data to generate accurate, personalized insights, and (3) a health coach agent that synthesizes data insights, guiding users using a specified psychological strategy and tracking users' progress. Furthermore, we propose and develop the Personal Health Agent (PHA), a multi-agent framework that enables dynamic, personalized interactions to address individual health needs. To evaluate each sub-agent and the multi-agent system, we conducted automated and human evaluations across 10 benchmark tasks, involving more than 7,000 annotations and 1,100 hours of effort from health experts and end-users. Our work represents the most comprehensive evaluation of a health agent to date and establishes a strong foundation towards the futuristic vision of a personal health agent accessible to everyone.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Iraq (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- (2 more...)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario (0.04)
Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts
Arnob, Samin Yeasar, Su, Zhan, Kim, Minseon, Ostapenko, Oleksiy, Ohib, Riyasat, Saleh, Esra'a, Precup, Doina, Caccia, Lucas, Sordoni, Alessandro
Merging parameter-efficient task experts has recently gained growing attention as a way to build modular architectures that can be rapidly adapted on the fly for specific downstream tasks, without requiring additional fine-tuning. Typically, LoRA serves as the foundational building block of such parameter-efficient modular architectures, leveraging low-rank weight structures to reduce the number of trainable parameters. In this paper, we study the properties of sparse adapters, which train only a subset of weights in the base neural network, as potential building blocks of modular architectures. First, we propose a simple method for training highly effective sparse adapters, which is conceptually simpler than existing methods in the literature and surprisingly outperforms both LoRA and full fine-tuning in our setting. Next, we investigate the merging properties of these sparse adapters by merging adapters for up to 20 natural language processing tasks, thus scaling beyond what is usually studied in the literature. Our findings demonstrate that sparse adapters yield superior in-distribution performance post-merging compared to LoRA or full model merging. Achieving strong held-out performance remains a challenge for all methods considered.